AITopics | shape score

Collaborating Authors

shape score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective

Fang, Zhengyu, Jiang, Zhimeng, Chen, Huiyuan, Zhang, Xiaoge, Tang, Kaiyu, Li, Xiao, Li, Jing

arXiv.org Artificial IntelligenceAug-5-2025

Diffusion models have shown strong performance in generating high-quality tabular data, but they carry privacy risks by reproducing exact training samples. While prior work focuses on dataset-level augmentation to reduce memorization, little is known about which individual samples contribute most. We present the first data-centric study of memorization dynamics in tabular diffusion models. We quantify memorization for each real sample based on how many generated samples are flagged as replicas, using a relative distance ratio. Our empirical analysis reveals a heavy-tailed distribution of memorization counts: a small subset of samples contributes disproportionately to leakage, confirmed via sample-removal experiments. To understand this, we divide real samples into top- and non-top-memorized groups and analyze their training-time behaviors. We track when each sample is first memorized and monitor per-epoch memorization intensity (AUC). Memorized samples are memorized slightly earlier and show stronger signals in early training. Based on these insights, we propose DynamicCut, a two-stage, model-agnostic mitigation method: (a) rank samples by epoch-wise intensity, (b) prune a tunable top fraction, and (c) retrain on the filtered dataset. Across multiple tabular datasets and models, DynamicCut reduces memorization with minimal impact on data diversity and downstream performance. It also complements augmentation-based defenses. Furthermore, DynamicCut enables cross-model transferability: high-ranked samples identified from one model (e.g., a diffusion model) are also effective for reducing memorization when removed from others, such as GANs and VAEs.

artificial intelligence, machine learning, memorization, (17 more...)

arXiv.org Artificial Intelligence

2505.22322

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)
Banking & Finance (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Understanding and Mitigating Memorization in Diffusion Models for Tabular Data

Fang, Zhengyu, Jiang, Zhimeng, Chen, Huiyuan, Li, Xiao, Li, Jing

arXiv.org Artificial IntelligenceDec-14-2024

Tabular data generation has attracted significant research interest in recent years, with the tabular diffusion models greatly improving the quality of synthetic data. However, while memorization, where models inadvertently replicate exact or near-identical training data, has been thoroughly investigated in image and text generation, its effects on tabular data remain largely unexplored. In this paper, we conduct the first comprehensive investigation of memorization phenomena in diffusion models for tabular data. Our empirical analysis reveals that memorization appears in tabular diffusion models and increases with larger training epochs. We further examine the influence of factors such as dataset sizes, feature dimensions, and different diffusion models on memorization. Additionally, we provide a theoretical explanation for why memorization occurs in tabular diffusion models. To address this issue, we propose TabCutMix, a simple yet effective data augmentation technique that exchanges randomly selected feature segments between random same-class training sample pairs. Building upon this, we introduce TabCutMixPlus, an enhanced method that clusters features based on feature correlations and ensures that features within the same cluster are exchanged together during augmentation. This clustering mechanism mitigates out-of-distribution (OOD) generation issues by maintaining feature coherence. Experimental results across various datasets and diffusion models demonstrate that TabCutMix effectively mitigates memorization while maintaining high-quality data generation.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.11044

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
Europe > Germany (0.04)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Banking & Finance > Credit (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Refining the Optimization Target for Automatic Univariate Time Series Anomaly Detection in Monitoring Services

Dong, Manqing, Zhao, Zhanxiang, Geng, Yitong, Li, Wentao, Wang, Wei, Jiang, Huai

arXiv.org Artificial IntelligenceJul-20-2023

Time series anomaly detection is crucial for industrial monitoring services that handle a large volume of data, aiming to ensure reliability and optimize system performance. Existing methods often require extensive labeled resources and manual parameter selection, highlighting the need for automation. This paper proposes a comprehensive framework for automatic parameter optimization in time series anomaly detection models. The framework introduces three optimization targets: prediction score, shape score, and sensitivity score, which can be easily adapted to different model backbones without prior knowledge or manual labeling efforts. The proposed framework has been successfully applied online for over six months, serving more than 50,000 time series every minute. It simplifies the user's experience by requiring only an expected sensitive value, offering a user-friendly interface, and achieving desired detection results. Extensive evaluations conducted on public datasets and comparison with other methods further confirm the effectiveness of the proposed framework.

data mining, machine learning, shape score, (17 more...)

arXiv.org Artificial Intelligence

2307.10653

Country: North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

DeSIGN: Design Inspiration from Generative Networks

Sbai, Othman, Elhoseiny, Mohamed, Bordes, Antoine, LeCun, Yann, Couprie, Camille

arXiv.org Machine LearningApr-3-2018

Can an algorithm create original and compelling fashion designs to serve as an inspirational assistant? To help answer this question, we design and investigate different image generation models associated with different loss functions to boost creativity in fashion generation. The dimensions of our explorations include: (i) different Generative Adversarial Networks architectures that start from noise vectors to generate fashion items, (ii) a new loss function that encourages creativity, and (iii) a generation process following the key elements of fashion design (disentangling shape and texture makers). A key challenge of this study is the evaluation of generated designs and the retrieval of best ones, hence we put together an evaluation protocol associating automatic metrics and human experimental studies that we hope will help ease future research. We show that our proposed creativity loss yields better overall appreciation than the one employed in Creative Adversarial Networks. In the end, about 61% of our images are thought to be created by human designers rather than by a computer while also being considered original per our human subject experiments, and our proposed loss scores the highest compared to existing losses in both novelty and likability.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

1804.00921

Genre: Research Report > New Finding (0.66)

Industry: Textiles, Apparel & Luxury Goods (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback